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ABSTRACT 

In the current era of large surveys and massive data sets, autoclassification of astrophysi- 
cal sources using intelligent algorithms is becoming increasingly important. In this paper we 
present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog 
(3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample 
of manually classified variable sources from the second data release of the XMM-Newton cata¬ 
logs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ^92%. We also evaluated 
the effectiveness of identifying spurious detections using a sample of spurious sources, achieving 
an accuracy of ^95%. Manual investigation of a random sample of classified sources confirmed 
these accuracy levels and showed that the Random Forest machine learning algorithm is highly 
effective at automatically classifying 3XMM sources. Here we present the catalog of classified 
3XMM variable sources. We also present three previously unidentified unusual sources that were 
flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 
400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid. 

Subject headings: catalogs — methods: statistical — X-rays: general 


1. Introduction 

Observational astronomy has entered a new 
era of large surveys that will produce incredible 
amounts of data at rates that are pushing beyond 
the limits of our ability to process in real time. Co¬ 
inciding with this flood is an ever growing moun¬ 
tain of archival data that is increasingly under¬ 
utilised. Intelligent methods to quickly and ac¬ 
curately identify astrophysical sources are needed, 
with machine learning algorithms proving to be 
very effective in this respect. 

The Random Forest machine learning algo- 

^ARC Centre of Excellence for All-Sky Astrophysics 
(CAASTRO) 


rithm (hereafter referred to as RF) has shown 
great promise in the automatic classification of 
variable stars ( [Richards et al.||2Qll[ [Dubath et al. 


2011), the photometric classification of supernovae 
(Carliles et al. and most recently the auto¬ 


matic classification of variable X-ray sources (Lo 
et al.|[2M4 ). RF is an ensemble supervised classi¬ 


fication algorithm that builds a forest of decision 
trees using a bootstrap sample from a training 


set of sources with known classification (Breiman 


2001). It is one of the most accurate classification 


algorithms available (Caruana & Niculescu-mizil 


2006), is extremely fast, and can handle large data 
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sets with a large number of featuresIn addition, 
there are only two parameters that the user needs 
to specif}0- the number of randomly selected fea¬ 
tures used at each node within the decision tree, 
and the number of trees in the forest - making it 
extremely easy to use. 


In previous work (Lo et ah 2014) we inves¬ 
tigated the feasibility of using RF to automat¬ 
ically classify the variable X-ray sources in the 
second data release of the Second XMM-Newton 


Serendipitous Source Catalog (2XMMi-DR2; Wat¬ 


son et ah 2009). At the time of its release 


in August 2008, 2XMMi-DR2 was the largest 
X-ray source catalog ever produced. We used 
the sample of 2XMMi-DR2 variable sources that 
had been manually classified (Farrell et al. in 
prep) as a training set for the RF classifier, ob¬ 
taining an accuracy (evalu ated through 10 -fold 
cross- validatior0 of ^97%. Lo et al. (2014) also 


demonstrated the capability of RF to identify out¬ 
lier sources that may represent rare new source 
populations, a stated scientific goal of most as¬ 
tronomical surveys. The training set was com¬ 
prised of sources belonging to 7 categories: ac¬ 
tive galactic nuclei (AGN), cataclysmic variables 
(CVs), gamma ray bursts (GRBs), super soft 
sources (SSSs), stars, ultra luminous X-ray sources 
(ULXs), and X-ray binaries (XRBs). 

The Third XMM-Newton Serendipitous Source 
Catalog (3XMM-DR4, hereafter referred to simply 
as 3XMM; Rosen et al.||2Q15 ) was released in July 
2013 and contains 531,261 detections of 372,728 
unique sources of which 3,696 are flagged as vari¬ 
able. 3XMM represents a ^40% increase in unique 
sources over 2XMMi-DR3, and a ^63% increase 
in variable sources over 2XMMi-DR2. 3XMM 
was constructed from 7,427 XMM-Newton obser- 


^In machine learning a feature is a measurable property of 
the object being classified, either a real number or a cate¬ 
gorical label. 

^There are more parameters that can be specified (e.g. it is 
possible to prune the trees, stop splitting once a particular 
node size is reached, or require that a minimum number of 
sources must be present in any given leaf), however only 
the two parameters described here are required. In this 
work we grew the trees fully without pruning. 

^The training set is divided into 10 sets. The model is then 
trained with nine sets and used to classify the remaining 
sample set, and then repeated for 10 different combinations. 
The accuracy is the total number of correctly classified sam¬ 
ples divided by the total number of samples in the training 
set. 


vations with the European Photon Imaging Cam¬ 
eras (EPIC) performed between 3 Eebruary 2000 
and 8 December 2012, and is the largest X-ray 
source catalog so far released. 


In this paper we present a catalog of 3XMM 
variable sources that have been classified into six 
source categories using the RE classifier (hereafter 
referred to as the source class classification). As 
this is a serendipitous catalog, we would not ex¬ 
pect a difference in the composition of sources in 
2XMMi-DR2 versus 3XMM. We thus employed 
the same sample of 2XMMi-DR2 variable sources 
as used in Lo et al. (2014) for a training set. We 
also present the results of a study into the effec¬ 
tiveness of using the RE classifier to discriminate 
between spurious and real sources in the 3XMM 
catalog (hereafter referred to as the quality control 
classification). 


2. Data Preparation Feature Selection 


Each unique source in both our training sets 
and our sample of unknown sources has multi¬ 
ple detections and thus a number of sets of X- 
ray features. In each observation a source may 
be detected by one or all of the three EPIC cam¬ 
eras, which in turn may have multiple exposures 
within a given observation (each with a unique 
light curve, although the other 3XMM features 
are the same for all exposures for a given camera 
within an observation). In addition, a number of 
fields were observed more than once providing ad¬ 
ditional detections taken at different epochs. We 
treated each detection independently in both the 
training and test sets and thus classified each de¬ 
tection separately. However, we combined the sep¬ 
arate classifications from each detection to provide 
an overall classification (see §3 for details of how 
this was performed). 

With a few exceptions we used the same fea¬ 
tures for our classification as we used in lLo et al.l 
(2014). Erom 3XMM we took the four hardness 


ratios and errors, the Calactic latitude and longi¬ 
tude, the 0.2-12 keV (i.e. band 8) flux, the source 
extent (in arcseconds) and the (maximum) likeli¬ 
hood of the source being extended, the distance to 
the nearest neighbour in 3XMM, the source qual¬ 
ity flag, and the confusion flag. The hardness ra¬ 
tios are defined as the ratio of count rates in two 
adjacent bands (normalised so as to always be be- 
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tween —1 and +1). They provide information on 
the crude shape of the X-ray spectrum and can 
thus be a powerful discriminator between differ¬ 
ent X-ray emissions mechanisms and thus different 
source types. However, when a source is not de¬ 
tected in either band used to calculate a hardness 
ratio, the resulting value is essentially a random 
number between —1 and +1. We therefore set the 
hardness ratio to a flag value of —10 and the error 
to 0 when both count rates used to calculate the 
hardness ratio were within 3cr of zero. To gener¬ 
ate the timing features, we analysed the 3XMM 


light curves using the same methods in Lo et al 


(2014) after filtering out all points that lie outside 
the light curve good time intervals (GTIs). We 
searched for periodic variability (using the gener¬ 
alised Lomb-Scargle periodogram from [Zechmeis- 


ter fc Kurster||2009 ), power law decays, flares (us¬ 
ing the Bayesian blocks technique; Scargle|[l998[ ), 
and also extracted a range of statistical features. 
A detailed description of how these features were 
extracted is given by Lo et al. (2014). Table 


provides a complete list of the X-ray and timing 
features included in our classification. 

In addition to the X-ray features, we cross- 
matched our variable source sample against multi¬ 
wavelength catalogs. We used the Naval Ob¬ 
servatory Merged Astrometric Dataset (NOMAD; 
Zacharias et al.|2004 ) for optical and near-infrared 
matches and the NR AO VLA Sky Survey (NVSS; 
Condon et al. 1998[ ), the Sydney University Mo- 
longlo Sky Survey (SUMSS; |Manch et al.| 200^ , 
and the Second Epoch Molonglo Galactic Plane 
Survey (MGPS-2; [Murphy et al. 2007) to search 
for radio counterparts, using the 3cr errors as a 
match criteria. When multiple counterparts were 
found we took the closest match as the correct one. 
Magnitudes in the BVR optical and JHK near- 
infrared bands, radio flux densities, as well B—V, 
V—R, J—H, and H—K colors were provided for 
those sources for which a counterpart was found. 
We also calculated X-ray to optical. X-ray to near- 
infrared, and X-ray to radio flux ratios for each 
band as well as the probability of a chance cross¬ 


match using the Bayesian method from Budavari 


& Szalay (2008). We note that due to the way 


that our training set sample was constructed (i.e. 
identifying 2XMM sources by matching against 
the SIMBAD and NED data bases) creates a bias 
towards brighter well known sources. As such a 


higher proportion of our training set sources have 
multi-wavelength matches than the overall 3XMM 
variable source sample, potentially leading the 
model to confuse fainter sources that do not have a 
match due to the limited sensitivity of the catalogs 
with brighter sources that by their nature do not 
have a multi-wavelength match. In an attempt to 
counteract this bias we set the multi wavelength 
features to flag values of —1 x 10^ for those 
sources where no counterpart was found, so that 
the model will down-weight the importance of the 
multi-wavelength properties for fainter source^ 
Similarly, for those sources where a counterpart 
was identified but magnitudes, fluxes, and/or col¬ 
ors were missing we set the relevant missing values 
to —1 X 10^. We chose flag values way outside the 
parameter space so that when SMOTE oversam¬ 
pling (see below) is employed the flag values will 
still remain significantly removed from the true 
parameter space. 


In Lo et al. (2014) we cross-matched our sam¬ 
ple against the Third Reference Gatalog of galaxies 
(RC3;|^ Vancouleurs et al.|1991[ ) in order to iden¬ 
tify which sources were potentially extragalactic. 
RG3 contains ^23k galaxies within a distance < 
600 Mpc, with a mean distance of ^40 Mpc and a 
standard deviation of ^50 Mpc. Eor this work we 
instead used a sample of ^1.4M galaxies < 65 Gpc 
extracted from the NASA Extragalactic Database 
(NED) that had angular sizes and distances, which 
has a mean distance of ^2 Gpc and a standard 
deviation of ^4 Gpc. We cross-matched our sam¬ 
ple against this NED galaxy catalog using the 3cr 
X-ray source position and the galaxy D25 ellipse 
as a match criteria. When multiple matches were 
found we took the galaxy closest to the 3XMM 
position as the correct match. The features that 
were included for the galaxy cross-match include: 
a boolean flag indicating whether or not a match 


note that there are numerous techniques for imputing 
values when data is missing from your sample. However, 
the lack of a multi-wavelength counterpart could either in¬ 
dicate that no counterpart is present (thus providing use¬ 
ful information about the nature of the source) or simply 
be due to the limited sensitivity of the multi-wavelength 
catalogs (which provides no information gain). We ex¬ 
perimented with the imputation function p rovided in the 
missForest ( [Stekhoven &: Buehlmann||2012| > R library and 
found that while the overall classification accuracy did not 
vary significantly when missing data values were imputed, 
the accuracy for the minority GRB class dropped from 46% 
to 13% when imputation was employed. 
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was found, the angular separation between the 
3XMM source position and the galaxy centre, the 
ratio of the source/galaxy angular separation and 
the elliptical radius of the galaxy in the direction 
of the source (a), and the log of the luminosity 
(calculated from the 0.2-12 keV 3XMM flux and 
the galaxy distance). As for the multi-wavelength 
matches described above we set flag values for 
sources where no match was identified in order to 
counteract the bias towards brighter sources in our 
training sample. For sources where no match was 
found, we set the angular separation and a to 1 x 
10^ and log(Lx) to —1 as flags. Table [^provides a 
complete list of the multi-wavelength and galaxy 
match features included in our classification. 


3. Methodology 


As in Lo et al. 


age (R Core Team 


(2014) we used the R pack- 


2013) randomForest (Liaw & 


Wiener 2002) for our classification. This is an 


open source package that is freely available to 
the community, not proprietary code that we have 
written ourselves. To determine the optimal num¬ 
ber of features we used the function tuneRF in 
randomForest, iterating through a range of val¬ 
ues between 2-20 for the number of features 
(with the number of trees set to 100) and com¬ 
paring the out of bag (OOB) error for each run 
until a plateau was reached. We found the op¬ 
timal number of features to be 15 for the source 
class classification and 16 for the quality control 
classification. To determine the optimal number 
of trees we again used tuneRF with the number of 
features set at the optimal value but varying the 
number of trees in each run again until a plateau 
in the OOB error was reached. For both classifi¬ 
cation runs the optimal number of trees was found 
to be 500. 

To evaluate the accuracy of our classifiers we 
used the same 10-fold cross-validation method as 
outlined in Lo et al. (2014). In our previous work 
we estimated an accuracy of ^97% for classify¬ 


ing the 2XMMi-DR2 variable sources (Lo et al. 


2014). However, we used the entire training set for 


the 10-fold cross-validation, which unintentionally 
introduced a bias into the accuracy estimatior0 


^We note that this does not effect the classification of the 
unknown va riable sources in 2XMMi-DR2 presented in |Lo| 
|et al.| ( [2014| >, just the accuracy value reported. 


Our training and test sets were comprised of ran¬ 
domly selected detections for the cross-validation 
without cross-registration between detections by 
unique source number. As such, it is possible to 
have rows in both the training set and the sample 
to be classified that correspond to the same source 
and have almost identical feature^ If this hap¬ 
pens the classifier is essentially classifying data in 
the test sample using a model built from (almost) 
the same data, thus producing an unrealistically 
high classification accuracy. To avoid this bias we 
randomly selected sources for the training and test 
sets for the cross-validation by unique source num¬ 
ber, thus ensuring that all the detections for each 
unique source will either be in the training set or 
the test set, but never in both. However, we used 
the entire training set (i.e. with multiple rows per 
source per observation) to build the model for the 
classification of the unknown source sample. 

We expect that the higher the signal-to-noise 
(S/N) ratio of the X-ray detection the better RF 
should perform with respect to the classification of 
real sources, as the fractional error of the X-ray pa¬ 
rameters will reduce with increased photon counts. 
We tested this assertion empirically, finding that 
the classification accuracy did indeed increase with 
the S/N of the X-ray detection (from 73% accu¬ 
racy for detections with S/N < 1 to 96% accu¬ 
racy for detections with S/N > 1000, evaluated 
through 10-fold cross-validation). To obtain the 
overall classification for each unique source in our 
main classification, we thus took the mean of the 
individual detection classifications for each source 
class weighted by the number of photon counts 
in that detection. The overall classification of a 
unique source is taken as the source class with the 
highest probability. For our quality control classi¬ 
fication we simply took the mean of the individual 
detection classifications, as higher photon counts 
do not correspond with a more precise classifica¬ 
tion as the majority of spurious detections are due 
to the presence of very bright nearby sources and 
therefore have high S/N ratios. 


source detected in all three EPIC cameras in an obser¬ 
vation will have at least three rows in the training set, all 
with identical multi-wavelength and galaxy match parame¬ 
ters and very similar (though not identical) X-ray features. 
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Table 1: List of X-ray Features Used for Classification. 


Feature 

Description 

X-ray Features 

Inst 

EPIC instrument (pn, MOSl, or MOS2) for each detection 

HRl 

Hardness ratio 1 (calculated from 0.2—0.5 keV and 0.5—1 keV count rates) 

HRl.err 

Error on hardness ratio 1 

HR2 

Hardness ratio 2 (calculated from 0.5—1 keV and 1—2 keV count rates) 

HR2.err 

Error on hardness ratio 2 

HR3 

Hardness ratio 3 (calculated from 1—2 keV and 2—4.5 keV count rates) 

HR3_err 

Error on hardness ratio 3 

HR4 

Hardness ratio 4 (calculated from 2—4.5 keV and 4.5—12 keV count rates) 

HR4_err 

Error on hardness ratio 4 

LII 

Galactic latitude (deg) 

BII 

Galactic longitude (deg) 

EP_8_FLUX 

Band 8 0.2—12 keV flux (erg cm“^ s“^) 

EP.EXTENT 

Source extent (arcsec) 

EP_EXTENT_ML 

Maximum likelihood that the source is extended 

DIST_NN 

Distance to the nearest 3XMM source (arc sec) 

SUM.ELAG 

Source quality flag 

CONEUSED 

Source confusion flag 


X-ray Timing Features 


num_flares 

flare_sizel 

flare _timel 

ls_pl 

ls_p2 

ls_probl 

ls_prob2 

ls_al 

ls_a2 

A 

FO 

to 

r.chisq 

Amplitude 

Std 

BeyondlStd 
Flux_ratio_mid20 
Flux .ratio _mid35 
Flux .ratio _mid50 
Flux .ratio _mid65 
Flux.ratio.mid80 
skew 

Max_slope 

Median.abs.dev 

Med_buffer_range_per 

Percent .amp 

Per _diff .flux 

Mod .index 

Fvar 


Number of flares in the X-ray light curve 
Amplitude of the strongest flare (count s“^) 

Duration of the strongest flare (s) 

Period corresponding to the highest Lomb-Scargle periodogram peak (s) 

Period corresponding to the 2nd highest Lomb-Scargle periodogram peak (s) 

False alarm probability of the highest Lomb-Scargle periodogram peak 

False alarm probability of the 2nd highest Lomb-Scargle periodogram peak 

Amplitude of the most significant period in the Lomb-Scargle periodogram 

Amplitude of the 2nd most significant period in the Lomb-Scargle periodogram 

Inverse of the power law index for the power law model fit 

Normalization of the best fit power law model 

Time zero for power law decay fit 

Reduced for the fit to the power law decay model 

0.5 X [Max(rate) — Min(rate)] (count s“^) 

Standard deviation of the X-ray light curve 

Percentage of data points in the light curve > la from the weighted mean 
Ratio of the flux in the 60th to 40th percentiles over the 95th to 5th percentiles 
Ratio of the flux in the 67.5th to 32.5th percentiles over the 95th to 5th percentiles 
Ratio of the flux in the 75th to 25th percentiles over the 95th to 5th percentiles 
Ratio of the flux in the 82.5th to 17.5th percentiles over the 95th to 5th percentiles 
Ratio of the flux in the 90th to 10th percentiles over the 95th to 5th percentiles 
Skew of the distribution of count rates 

Maximum slope of adjacent data points in the light curve (count s“^) 

Median of the absolute deviation from the mean count rate in the light curve 
Percentage of measurements within 20% of the median 

Fractional difference between the highest count rate data point from the median 
Difference between the 98th percentile and 2nd percentile count rates (count s“^) 
Variance/weighted mean 

Fractional rms variability of the X-ray light curve 
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Table 2: List of Multi-wavelength and Galaxy Match Features Used for Classification. 

Feature Description 

Optical and Near-Infrared Features 
B NOMAD B-band magnitude 

V NOMAD V-band magnitude 

R NOMAD R-band magnitude 

J NOMAD J-band magnitude 

H NOMAD H-band magnitude 

K NOMAD K-band magnitude 

B—V Color calculated from NOMAD B and V-band magnitudes 

V—R Color calculated from NOMAD V and R-band magnitudes 

J—H Color calculated from NOMAD J and H-band magnitudes 

H—K Color calculated from NOMAD H and K-band magnitudes 

Fx/Fb Ratio of 0.2—12 keV X-ray to B-band flux 

Fx/Fv Ratio of 0.2—12 keV X-ray to V-band flux 

Fx/Fr Ratio of 0.2—12 keV X-ray to R-band flux 

Fx/Fj Ratio of 0.2—12 keV X-ray to J-band flux 

Fx/Fh Ratio of 0.2—12 keV X-ray to H-band flux 

Fx/Fk Ratio of 0.2—12 keV X-ray to K-band flux 

nomad_Bayes Bayes factor for cross-match against NOMAD catalog 

Radio Features 

radio_flux Radio flux density from NVSS (1.4 GHz) or SUMSS/MGPS-2 (843 MHz) catalogs (mJy) 
Fx/Fr ad Ratio of 0.2—12 keV X-ray to radio flux 

radio_Bayes Bayes factor for cross-match against radio catalogs 

Galaxy Features 

isGalMatch Boolean flag indicating whether or not a match against a galaxy was found 
galAngSep Angular distance of 3XMM source from the galaxy centre (arcsec) 
r_ratio Ratio of the angular distance to the galaxy centre over the radius of the galaxy (a) 

Luminosity Logic of fho 0.2—12 keV X-ray luminosity at the galaxy distance (logic(erg s“^)) 
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3.1. Training Set Construction 

3.1.1. Source Class Classification 


For the source class classification, we used 
the same sample of manually classified variable 
2XMMi-DR2 sources as in Lo et al. (2014). The 
release of the 3XMM catalog involved a bulk re¬ 
processing of all the XMM-Newton data and thus 
includes a number of improvements that have been 
incorporated into the pipeline (such as improved 
source characterisation, astrometry, and greater 


sensitivity). We therefore cross-matched the |Lo et 
( 2Q14| ) training set of 873 2XMMi-DR2 


al 


against 3XMM in order to take advantage of these 
improvements, finding a match for 869 of thenQ 
In Lo et al. (2014) we separated the training set 
into 7 classes: AGN, CVs, GRBs, SSSs, STARs, 
ULXs, and XRBs. These classes made up the main 
types of sources identified through manual classi¬ 
fication, and generally have very different phys¬ 
ical properties. Although SSSs are a sub-class 
of GVs (with a white dwarf accreting from a bi¬ 
nary companion undergoing steady thermonuclear 
burning), they have extremely soft X-ray spectra 
with very little emission above 1 keV and thus 
appear quite different to the bulk of other GVs, 
leading us to consider them as a different class of 
object in |Lo et al. (2014). However, some novae 
(another sub-class of GVs present in our sample) 
have been observed to transition into a super-soft 
phase where they show similar X-ray properties to 
SSSs, but at other times look very different. We 
therefore combined our SSS and GV samples for 
this classification, as in principle there is no way 
that the classifier should be able to differentiate 
between a persistent SSS and a nova in a super- 
soft phas^ Indeed, without multiple observations 
of the same source over a long timescale (which 
would allow us to identify novae passing through 
a transient SSS phase and a persistent SSS) we are 


^Due to the improvements to the pipeline some sources 
present in previous versions of the catalog are not present 
in 3XMM or have shifted astrometry such that they do not 
match with 3XMM sources. 


Tn 


Lo et al. 


(2014) the classifier proved highly effective at 
despite there be- 


discriminating between SSSs and CVs, 
ing a number of super-soft novae in the training sample. 
However, the majority of the SSSs in the sample were ex- 
tragalactic while all but one of the CVs were in our own 
Galaxy. The classifier thus incorrectly placed significant 
weight on the galaxy match features. 


unable ourselves to discriminate between the two 
types of object. Table shows the breakdown of 
the training set into our 6 source classes. 


As can be seen in Table our training set is 
heavily unbalanced, with the number of detections 
of the most abundant class (stars) outnumber¬ 
ing the rarest class (GRBs) by a factor of ~240. 
This imbalance will significantly bias the model 
towards classifying an unknown object as the ma¬ 
jority class (i.e. stars), leading to a higher ac¬ 
curacy for classifying stars but a lower accuracy 
for classifying the minority classes (in particular 
GRBs), despite rare objects being of particular 
interest to us. To compensate for this bias we 
oversampled all classes except for the stars using 
the SMOTE algorithm (Ghawla et al. 2002), which 
creates synthetic minority class samples with fea¬ 
ture values selected using the k-nearest neighbours 
method from within the parameter space of the 
real sources belonging to a given clas^ We used 
the SMOTE implementation in the DMwR package 
( Torgo|2QlQ ) in R to oversample the AGN by a fac¬ 
tor of 3.5, the GVs by a factor of 7, the GRBs by a 
factor of 200, the ULXs by a factor of 15, and the 
XRBs by a factor of This oversampling thus 
provided approximately similar detection numbers 
for each class in the model as the dominant class 
of stars. To evaluate the impact of oversampling 
on the classification accuracies we built RE mod¬ 
els both with and without SMOTE oversampling. 
We found that the overall 10-fold cross-validated 
classification accuracy and the accuracy for clas¬ 
sifying the majority star class did not change sig¬ 
nificantly when oversampling was employed (they 
were consistent with the non-oversampled accura- 


®We note that the use of SMOTE oversampling with data 
where fiag values are employed to identify missing data 
(such as multi-wavelength and galaxy matches) may have 
unintended consequences as the fiags will skew the param¬ 
eter space to unphysical values. However, by selecting fiags 
well outside the parameter space we attempted to force the 
model to produce values that are much closer to the fiag 
values than real values. To test this we evaluated the classi¬ 
fication accuracy both with and without SMOTE oversam¬ 
pling. We found that the RE model built with SMOTE and 
with fiags representing missing data performed significantly 
better than a model without SMOTE and with imputed 
values for missing data when classifying the minority GRB 
class, and achieved similar accuracies when classifying the 
majority STAR class. 

^®We note that the total number of detections, not the num¬ 
ber of unique sources, is oversampled. 
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cies within 1%). However, the classification accu¬ 
racy for the minority GRB class improved signifi¬ 
cantly (from 13% to 46%) with oversampling. As 
such, we chose to use oversampling for our subse¬ 
quent analyses. 


3.1.2. Quality Control Classification 


Despite the incorporation of significant im¬ 


provements in the PSF modelling (Read et al. 


2011), the 3XMM pipeline source detection al¬ 


gorithm still occasionally detects spurious point 
sources around bright sources, in crowded fields, 
and in diffuse emission. Spurious detections can 
also occur due to optical loading from bright stars, 
as the EPIC cameras (particularly the MOS de¬ 
tectors) are sensitive to bright optical emission 
that can produce features in the images that are 
incorrectly detected as X-ray sources. Each source 
in the catalog is automatically assigned a quality 
flag (SUM_ELAG) by the pipeline based upon its 
proximity to regions that may cause issues with 
the reliability of source parameters/products or 
where spurious detections commonly occur (e.g. 
near a bright point source or diffuse emission). 


While filtering out sources with quality flags ^ 
2 will provide a reliable sample, the way in which 
the flags are assigned means that such a filter¬ 
ing criteria will also discard bright real sources 
(around which spurious detections are common) 
that are potentially of interest to the user. 


Spurious detections are easily identifiable through 
inspection of images and source products, so we 
hypothesised that spurious sources could be au¬ 
tomatically identified via RE classification. To 
test this, we constructed a binary training set of 
non-spurious and spurious sources. Eor the sam¬ 
ple of non-spurious sources we used the sample of 
spurious sources that were identified during our 
manual classification of the 2,267 2XMMi-DR2 
variable sources (Earrell et al. in prep). Eor the 
sample of non-spurious sources we used the same 
sample described in §3.1.1 that we used for our 
main classification (with the source class simply 
set to ‘REAL’). As with the main classifier training 
set, we first cross-matched our spurious and non- 
spurious sample of sources against 3XMM so as to 
obtain updated and improved source parameters. 
We used the same features for the classification 
as used for the source class classification. Table [3] 
shows the breakdown between real and spurious 


sources in our training set. 

Our quality control sample is less unbalanced 
than our main training set, yet the real detec¬ 
tions sill outnumber the spurious detections signif¬ 
icantly (Table . We thus oversampled the spu¬ 
rious source sample using the SMOTE algorithm 
( Ghawla et al.|[2QQ^ by a factor of 4 to provide 
approximately the same number of detections as 
for the real clas^Hl 

4. Classification Results Verification 

Table a shows the results of the classification 
of the 2,876 unknown 3XMM variable sources by 
source class and for quality control (full version 
available online). As described in §3, each detec¬ 
tion of each unique source was classified separately 
and then the overall accuracy was calculated by 
combining the detection classifications. We ob¬ 
tained an overall accuracy of ^92% for our classi¬ 
fication by source class, evaluated by 10-fold cross- 
validation. Eor the quality control classification, 
the overall accuracy was ^95%. When considering 
each detection separately the classification accura¬ 
cies were lower, with ^87% for our classification 
by source clas^^and ^94% for our quality control 
classification. 

Eigures and show the confusion matrices 
for the classification by source class and the qual¬ 
ity control classification, respectively. The num¬ 
ber in each square in the confusion matrices rep¬ 
resents the overall classification for each unique 
source compared to the actual classification ob¬ 
tained through manual inspection. The classifica¬ 
tion accuracy for each source class is given in Table 

calculated as the number of correct source clas¬ 
sifications over the total number of sources in each 
class of the training set. 

The distribution of the probability of being 
spurious is bimodal, with ^50% of the unique 
sources having P(Spur) ^ 20% and ^30% hav¬ 
ing P(Spur) ^ 80% (see Eigure [^. In order to 
test the overall accuracy of the quality control 


Again, we note that we oversampled the total number of 
detections not the number of unique sources. 

^^For comparison, we re-ran the classifi c ation using the 
2XMMi-DR2 training set in |Lo et aTl ( |2014| > but with 
the training and test sets randomly selected based on the 
unique source ID (i.e. to ensure no cross-over between the 
test and training sets), and obtained an accuracy of ~92%. 
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Table 4: 3XMM Variable Source Classifications. 


3XMM Name 

^AGN 

^cv 

^GRB 

^ STAR 

Pt/LX 

^XRB 

^ Max 

^ Spur 

Class 

Outlier 

Margin 

J000055.5+443710 

0.000 

0.002 

0.000 

0.994 

0.004 

0.000 

0.994 

0.002 

STAR 

0 

0.987 

J000209.5-300035 

0.122 

0.143 

0.000 

0.555 

0.113 

0.067 

0.555 

0.010 

STAR 

364 

0.109 

J000219.7-295607 

0.001 

0.003 

0.000 

0.996 

0.000 

0.000 

0.996 

0.000 

STAR 

2 

0.991 

J000222.8-060559 

0.014 

0.081 

0.018 

0.369 

0.043 

0.474 

0.474 

0.005 

XRB 

28 

-0.052 

J000300.6-294942 

0.024 

0.050 

0.000 

0.907 

0.005 

0.013 

0.907 

0.003 

STAR 

24 

0.815 

J000334.5-295830 

0.017 

0.059 

0.000 

0.913 

0.007 

0.003 

0.913 

0.001 

STAR 

16 

0.826 

J000354.2-255841 

0.004 

0.022 

0.000 

0.962 

0.007 

0.004 

0.962 

0.000 

STAR 

3 

0.924 

J000511.8+634018 

0.058 

0.115 

0.017 

0.583 

0.050 

0.176 

0.583 

0.008 

STAR 

294 

0.167 

J000532.8+200717 

0.776 

0.095 

0.002 

0.073 

0.002 

0.053 

0.776 

0.010 

AGN 

56 

0.552 

J000612.2+201304 

0.037 

0.109 

0.000 

0.468 

0.049 

0.336 

0.468 

0.910 

STAR 

594 

-0.064 

J000613.6+201118 

0.017 

0.073 

0.011 

0.243 

0.084 

0.572 

0.572 

0.963 

XRB 

50 

0.143 

J000613.6+201253 

0.044 

0.131 

0.002 

0.343 

0.012 

0.468 

0.468 

0.918 

XRB 

32 

-0.063 

J000618.2+201248 

0.021 

0.092 

0.023 

0.255 

0.132 

0.477 

0.477 

0.937 

XRB 

75 

-0.045 

J000621.5+201149 

0.025 

0.136 

0.026 

0.259 

0.118 

0.435 

0.435 

0.922 

XRB 

63 

-0.129 

J000627.0+200904 

0.028 

0.210 

0.002 

0.420 

0.048 

0.292 

0.420 

0.862 

STAR 

601 

-0.160 

J000631.0+200720 

0.031 

0.204 

0.000 

0.412 

0.087 

0.265 

0.412 

0.985 

STAR 

428 

-0.175 

J000634.7+200548 

0.049 

0.146 

0.000 

0.456 

0.057 

0.292 

0.456 

0.925 

STAR 

459 

-0.087 

J000635.5+200527 

0.047 

0.201 

0.000 

0.421 

0.071 

0.260 

0.421 

0.961 

STAR 

786 

-0.158 

J000638.9+200403 

0.038 

0.201 

0.002 

0.388 

0.050 

0.321 

0.388 

0.963 

STAR 

987 

-0.225 

J000639.6+200343 

0.046 

0.187 

0.002 

0.455 

0.059 

0.251 

0.455 

0.973 

STAR 

715 

-0.090 


Notes. Column 1: 3XMM name. Columns 2-7: probability given by our RF classifier that the source belongs to 


one of the training set source classes. Column 8: the maximum probability of the classification by source class. 
Column 9: the probability given by our quality control RF classifier that the source is spurious (averaged over all 
detections). Column 10: class given to the source by our classifier (calculated as the mean classifications over all 
detections weighted by the number of p hoton counts in that detection). Column 11: the outlier measure of the 
source. Equation 10 in (Lo et al.||2014) provides a definition of this parameter. Larger values indicate a higher 
likelihood of being an outlier. Column 12: the classification margin of the source (Margin = 2 x Fmgx ~ !)• 
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Table 3: Number of Sources and Classification Ac¬ 
curacy for the Training Sets and Unknown Sample 


Glass 

Sources 

Detections 

Accuracy^ 

AGN 

99 

435 

90% 

CV 

91 

219 

75% 

GRB 

8 

8 

63% 

STAR 

571 

1,931 

99% 

ULX 

17 

no 

59% 

XRB 

83 

632 

77% 

REAL 

867 

8,841 

98% 

SPURIOUS 

363 

1,939 

89% 

Unknowns 

2,876 

18,619 



^Overall classification accuracy for each class 
from 10-fold cross-validation. 
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Fig. 1.— Confusion matrix for the classification 
by source class. 


classification, we randomly selected a sample of 
200 sources and manually inspected their 3XMM 
products (i.e. images, light curves, spectra etc.) 
to determine whether or not they were spurious. 
Of these 200 sources, 96% were correctly classified 
as either real (i.e. P(Spur) < 50%) or spurious 
(i.e. P(Spur) ^ 50%) by the algorithm, consistent 
with the accuracy of ^95% obtained through 10- 
fold cross-validation. To investigate the accuracy 
as a function of P(Spur), we randomly selected 20 
sources from each 10-percentile P(Spur) bin and 
performed the same manual verification (see Fig¬ 
ure [^. As expected, a high P(Spur) corresponds 
to a high probability that a sources is spurious, 
while the majority of sources with low P(Spur) val¬ 
ues are real. Taking P(Spur) ^ 30% should thus 
provide a reliable sample of sources. To test this, 
we randomly inspected 200 sources with P(Spur) 
^30%, finding that 97% were indeed real sources. 

The distribution of maximum probabilities for 
the classification by source class is also bimodal, 
with the maximum peak around P(Max) ^ 95% 
and a second broader peak around P(Max) ^ 45% 
(see Figure]^. Filtering out sources with P(Spur) 
^ 30% primarily discards sources with low P(Max) 
values, indicating as expected that RF has trou¬ 
ble classifying spurious sources into real source 
classes. Table |5] shows the breakdown of classi¬ 
fied unknown sources by source class for the entire 
sample (‘all’) and for the sample with P(Spur) ^ 
30% (‘good’). To verify the accuracy of the clas¬ 
sifier we manually inspected a random sample of 
sources with P(Spur) ^ 30%, checking for identi¬ 
fications within SIMBAD and NED. We identified 
101 real sources that had an identification in the 
literature, of which 92% were in agreement with 
the classification provided by RF, consistent with 
the accuracy of ^92% obtained by 10-fold cross- 
validation. 

We next evaluated the classifier accuracy for 
each source class and as a function of P(Max), 
considering only the sample of ‘good’ sources (i.e. 
P(Spur) ^ 30%). There were ^ 120 good sources 
in each class except for the sources classified as 
stars (see Table |^. We thus manually checked 
all good sources that were classified as an AGN, 
CV, GRB, ULX, or XRB for identifications in 
SIMBAD and NED. Eor the stars, we checked a 
sample of sources randomly selected from each 
10-percentile P(Max) probability bin between 40- 
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Fig. 2.— Confusion matrix for the spurious vs 
non-spurious source classification. 



Fig. 3.— Distribution of spurious probabilities for 
the classified unknown sample of 2,876 sources. 


Ta ble 5: Breakdown of Classified Sources 


Class 

All 

% 

Good 

% 

AGN 

144 

5.0% 

107 

7.3% 

CV 

152 

5.3% 

70 

4.7% 

GRB 

5 

0.2% 

5 

0.3% 

STAR 

1,942 

67.5% 

1,162 

78.8% 

ULX 

54 

1.9% 

11 

0.7% 

XRB 

579 

20.1% 

120 

8.1% 

Total 

2,876 

100% 

1,475 

100% 


100% until we had found 10 sources in each bin 
with a literature identification. There were only 
4 sources in the 20-30% bin so we inspected all of 
them. In the 30-40% bin there were 29 sources of 
which 11 had a match in SIMBAD or NED, all of 
which we included in our sample. The results of 
this manual verification are shown in FigureFor 
the classes with decent sampling (i.e. the AGN, 
CV, STAR, and ULX classes) it is clear that (as 
expected) the classification accuracy is correlated 
with P(Max). This is demonstrated even more 
clearly in Figure which presents the accuracy 
per P(Max) bin across all source classes. Selecting 
classified sources with P(Spur) ^ 30% and P(Max) 
^60% should thus provide a clean sample of real 
sources with correct classifications. 


The classifier performed particularly well on the 
minority GRB class. Five sources in the unknown 
sample were classified as GRBs, all of which were 
real sources with very low probabilities of being 
spurious as determined by the quality control clas¬ 
sification. Four of these objects were known GRBs 
that were the targets of the observations. The fifth 
source (3XMM J054707.6-1-001742) has not previ¬ 
ously been identified but demonstrates a power 
law decay in its light curve and has a spectrum 
that could be consistent with an absorbed moder¬ 
ately steep power law, similar to what is observed 
from known GRBs. However, the spectrum also 
shows very strong iron line emission which is not 
observed from other GRBs, and its location is con¬ 
sistent with a nebula rather than a galaxy thus in¬ 
dicating that it is most likely a star rather than a 
GRB. We also investigated those sources where the 
highest classification probability did not indicate 
a GRB, but where the GRB probability was the 
second highest assigned by the classifier. Twelve 
sources met this criterion, of which 5 were real 
sources. Two of these sources were known stars, 
while another was a known GRB (again, the tar¬ 
get of the observation). The two remaining real 
sources have not previously been classified, but 
are likely to be stars due to their coincidence with 
bright point like optical sources. 


In order to estimate the relative importance of 
each feature, we calculated the Gini index which 
measures the total decrease in node impurities 
from splitting on a given feature, averaged over 
all the trees in the forest (see Equation 1 in 


Lo 


et al. 2014). Eigure shows the relative feature 
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Fig. 4.— Results of manual verification for quality 
control classification. The histograms represent 
the fraction of sources that were correctly classi¬ 
fied by Random Forest as a function of the prob¬ 
ability of being spurious. There were 20 sources 
evaluated per P(Spur) bin. 
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Fig. 5.— Distribution of maximum classification 
probabilities for the classified unknown sample. 
The solid red histogram shows the distribution 
for the entire sample (2,876 sources), while the 
dashed blue histogram shows the distribution for 
the clean sub-sample (1,475 sources), i.e. sources 
with P(Spur) ^ 0.3. 



Fig. 6.— Results of manual verification by source 
class. The histograms represent the fraction of 
sources that were correctly classified by Random 
Forest as a function of the maximum probability. 
The black circles indicate the number of sources 
in each P(Max) bin. 
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importance of the top 30 features for the classifi¬ 
cation by source type. Surprisingly, the normal¬ 
ization and slope of the power law model fitted 
to the light curves are the most important fea¬ 
tures, followed by the H—K near-infrared color, 
the Galactic latitude, and the X-ray flux. This 
differs from the Lo et al. ( |2Q14[ ) results, which 
found the five most important features to be (in 
order of decreasing importance) the X-ray flux. 
X-ray luminosity. X-ray hardness ratio HRS, K- 
band magnitude, and the r_ratio (i.e. a). It is 
probable that this discrepancy is due to the signif¬ 
icant increase in the number of detections in the 
training set used in this work, achieved as a re¬ 
sult of including detections that did not have the 
light curve timing features. Overall, the optical 
and near-infrared features (specifically the colours 
and X-ray to optical/near-infrared flux ratios) ap¬ 
pear to be highly informative, as do the X-ray flux 
and hardness ratios. Nonetheless, the inclusion of 
the timing features has a significant effect on the 
model accuracy as the 10-fold cross-validation ac¬ 
curacy drops to ^85% when the timing features 
are removed. 


We also calculated the relative importance of 
the features for the quality control classification 
(see Figure [^. As expected, the 3XMM qual¬ 
ity flag (SUM_FLAG) is by far the most im¬ 
portant feature, followed by the probability of a 
chance cross-match with a NOMAD source (i.e. 
the nomad_Bayes feature), the EPIG extent max¬ 
imum likelihood and extent, and the J—H color. 
Somewhat surprisingly, the distance to the nearest 
neighboring 3XMM source was not an important 
feature, and the 3XMM confusion flag did not rate 
in the top 30 feature list at all. This indicates that 
confusion with nearby sources is not a major issue 
with regards to spurious source detection. 


5. Outlier Sources 

In addition to classifying sources that belong 
to known classes, RF can also be used to identify 
sources that belong to novel classes that were not 
in the training set. While these outlier sources 
should have low P(Max) probabilities, this alone 
is insufficient to identify truly anomalous sources 
as missing information (e.g. a lack of mutli- 
wavelength or galaxy matches, poor S/N X-ray 
data etc.) will also produce low classification 


probabilities. A better method of identifying 
anomalous sources is to use the outlier measure, 
which represents the proximity of a given unknown 
source classified by RF to the training set source 
population for the same class. For each classified 
unknown source we calculated the proximity ma¬ 
trix and outlier measure using the randomForest 


package in R (see Equation 10 in Lo et al. 2014). 
We also calculated the classification margin, which 
is the difference between the probability of the 
source belonging to the class with P(Max) and the 
probability that it does not belong to that class, 
i.e. Margin = 2 x P(Max) — 1. Both the outlier 
measure and classification margin are provided for 
each of the classified unknown sources (see Table 

El‘ 

Eigure shows the classification margin vs 
outlier measure for the good sample of classified 
unknown samples. We selected a sample of 144 
good sources with a margin ^ —0.3 or an outlier 
measure ^ 400 for further investigation. Of these, 
4 were found to be spurious detections, 30 sources 
were previously identified (20 of which were clas¬ 
sified correctly by RE), while the remaining 110 
had no match in either SIMBAD or NED. The 
sample of previously identified sources contains a 
number of true rare sources that were not in the 
training set. These include: a soft X-ray tran¬ 
sient, an isolated neutron star (one of the magnif¬ 
icent seven), a soft gamma repeater (magnetar), 
a semi-detached Beta Lyra eclipsing binary, two 
candidate intermediate mass black holes (identi¬ 
fied previously through their unusual variability), 
and a highly unusual Seyfert 2 AGN with a ^3.8 
hr period and extremely soft X-ray spectrum (pos¬ 
sibly hosting an intermediate mass black hole; |Ho| 
et al. 2012). Also present in this sample is the 


source 3XMM J180658.7—500250, which was pre¬ 


viously identified as an outlier in Lo et al. (2014) 
and is also thought to be an unusual type of AGN. 


A number of sources were detected in mosaic 
mode observations (primarily of Jupiter or Mars, 
where the attitude was stepped during the obser¬ 
vation) that are known to be problematic and have 
unreliable X-ray data and products. All sources in 
the affected observations appear to show the same 
highly unusual variability and are also likely to 
have unreliable astrometry, leading to issues with 
the cross-matching against multi-wavelength and 
galaxy catalogs. This combination makes them 
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Fig. 9.— Relative importance of the features for the quality control classification. The higher the value of 

the mean decrease 

Gini impurity, the higher the relative importance of a feature. 





Max Probability (%) 

Fig. 7.— Results of manual verification for over¬ 
all classification. The histogram represents the 
fraction of sources that were correctly classified 
by Random Forest as a function of the maximum 
probability. The black circles indicate the number 
of sources in each P(Max) bin. 



Fig. 10.— Classification margin vs outlier mea¬ 
sure for the good classified unknown sources. The 
red squares indicate the 144 sources that were in¬ 
vestigated further. 


outliers compared to the training set, though in 
this case due to data processing issues rather than 
the nature of the sources. 


Of the remaining known sources, 16 were classi¬ 
fied as stars and had low numbers of X-ray photon 
counts, leading to low S/N data and large scat¬ 
ter in their X-ray properties. Five sources were 
previously identified AGN, one was a known nova 
(possibly in a super-soft phase), and another was 
a known ULX. Many of the stars appear to have 
high proper motions such that cross-matching 
against the NOMAD catalog either found no coun¬ 
terpart or an incorrect match. Three of the outlier 
sources (all of which were classified as stars) had 
good S/N and showed truly unusual properties. 
We discuss these in the following sub-sections, 
concentrating on the detection with the highest 
outlier index. The light curves, timing analyses, 
spectra and spectral fit parameters of the other 
observations of each source are provided in the 
Appendix for completeness. 


In all cases the XMM-Newton data were re¬ 
duced using the Science Analysis Software 
(SAS) vl3.5 and the latest calibration files as of 
2014 August 21, using the same method outlined 
in [Callingham et al. (2012). X-ray spectral fit 


ting was performed using XSPEC vl2.8.1g (Arnaud 


1996) over energies between 0.3 - 10 keV, and 


the spectra were binned at 20 counts per bin to 
provide sufficient statistics for fitting. Pho¬ 
toelectric absorption was accounted for using the 
phabs model in XSPEC with the Wilms et al. (2000) 
elemental abundances. 


5.1. 3XMM J184430.9-024434: A Super¬ 
giant Fast X-ray Transient? 

3XMM J184430.9—024434 was observed twice 
with XMM-Newton on the 15th and 16th of April 
2010, and has two detections with the MOS2 
camera in 3XMM (it fell off the chip in the pn 
and MOSl exposures for both observations). The 
highest outlier measure of 1517 is in the first ob¬ 
servation and it has a classifier margin of —0.02. 
It has a 3XMM 0.2-12 keV fiux of ^ 8 x 10“^^ 
erg cm“^ s“^ and lies within the Galactic plane, 
^30° from the Galactic centre. No counterpart 
was found in NOMAD or the radio catalogs within 
the 3cr positional errors, although there is a NO¬ 
MAD source 2.8" from the 3XMM position with 
B ^ 19.1 mag, R ~ 17.3 mag, J ^ 14.8 mag, H ^ 


16 



































13.9 mag, and K ^ 13.4 mag. 

The 3XMM light curves show 5 distinct short, 
sharp flares in the first observation (Figure [IT]) and 
a single large flare in the second, reaching count 
rates of ^0.1 count s“^ before dropping back to 
zero. Such flares are reminiscent of super giant fast 
X-ray transients (SFXTs), a rare sub-class of high 
mass X-ray binary (HMXB) of which ^10 sources 
are currently known (see Sidoli|[2M4 for a recent 
review). SFXTs are characterised by short dura¬ 
tion X-ray flares (typically lasting ^ 10^ — 10^ s) 
produced by transient accretion onto a compact 
object (typically a neutron star) from the wind of 
a blue supergiant companion ( [Sidoli 2014). The 
dense wind environment produces high levels of 
photo electric absorption in their X-ray spectra 
and many SFXTs contain X-ray pulsars with spin 
periods of ^ 10 — 10^ s. 

We extracted source and background light 
curves (from circular regions of radii 25") binned 
at the frame time of 2.6 s, filtering out times with 
high background flaring. We corrected and back¬ 
ground subtracted the light curves using the SAS 
task epiclccorr and applied a barycentric cor¬ 
rection. We then searched for periodic variability 
using the fasper implementation of the Lomb- 
Scargle periodogram ( Press fc Rybicki|l989 ), and 
used Monte Carlo simulations to determine the 


99% white noise significance levels (e.g. Kong et 
al.|1998D . No evidence of periodic modulation was 


found in the power spectra of either observation, 
though significant power was observed at low fre¬ 
quencies due to the flaring (see Figure 

We also extracted spectra from circular source 
(radius = 25") and background (radius = 75") 
regions and generated response and ancillary re¬ 
sponse files. We fitted the spectrum from the first 
observation with a number of simple models in¬ 
cluding a power law, black body, bremsstrahlung, 
and thermal plasma (all with photoelectric absorp¬ 
tion components). We obtained the best fit with a 
power law model with high levels of photoelectric 
absorption (y^/dof = 17.4/19). The spectrum of 
the second observation was well fitted by a simi¬ 
lar model with y^/dof = 5.6/11. Figure 
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shows 

the spectrum from observation 1 fitted with an ab¬ 
sorbed power law. Table in the Appendix lists 
the spectral parameters for the best-fit models. 
The X-ray spectra and flare behavior are consis¬ 
tent with this source being a new member of the 



Fig. 11.— EPIC MOS2 light curve (bin 

size = 600 s) of the candidate SFXT 3XMM 
J184430.9—024434 from observation 1. 



Period (s) 


Fig. 12.— Lomb-Scargle power spectrum of 
the MOS2 light curve (binned at the frame 
time of 2.6 s) of the candidate SFXT 3XMM 
J184430.9—024434 from observation 1. The 99% 
significance level is indicated by the dashed line. 
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SFXT class. 


5.2. 3XMM J181923.7-170616: A Slow 
X-ray Pulsar? 

3XMM J181923.7—170616 was observed three 
times with XMM-Newton on the 7th of October 
2006 (obsid = 0402470101), the 21st of March 2010 
(obsid = 0604820101), and the 21st of March 2013 
(obsid = 0693900101). Only the first two of these 
observations are in 3XMM, with a total of 6 EPIC 
detections. The highest outlier measure of 1782 is 
from the pn detection in the second observation 
and it has a classifier margin of —0.10. It has a 
3XMM 0.2-12 keV fiux of 5 x 10“^^ erg cm“^ 
s“^ and lies within the Galactic plane, ^14° from 
the Galactic centre. It has a NOMAD counterpart 
with J ^ 16.0 mag, H ^ 14.0 mag, and K ^ 13.5 
mag but no match in the radio or galaxy catalogs. 

The 3XMM light curves show no obvious fea¬ 
tures, though it is moderately variable with a frac¬ 
tional variability amplitude of ^0.3 (Figureand 
Fignresandin the Appendix). We extracted 
source and background light curves (from circular 
regions of radii 30") from the pn data binned at 
the frame time of 73.4 ms, filtering out times with 
high background flaring. We then corrected and 
background subtracted the light curves using the 
same method as described in §5.1 and searched for 
periodic variability using the f asper implementa¬ 
tion of the Lomb-Scargle periodogram. A signifi¬ 
cant peak was found in the power spectrum of the 
observation 2 pn light curve at a period of 400 s 
(Figure [T^. The profile of the light curve when 
folded over a period of 400 s (using the ef old task 
in the FTOOLS software package) is roughly sinu¬ 
soidal (see Figure [^. The same periodic variabil¬ 
ity was also detected in the other two observations. 
This 400 s period was not detected in our auto¬ 
matic analysis of the 3XMM light curves that was 
used to generate the timing features. However, the 
generalised Lomb-Scargle method used to generate 
our timing features searched for periods only down 
to four times the bin width, insufficient to detect 
a period of ^400 s given that the light curve was 
binned at 150 s. 

We extracted spectra from source (radius = 
30") and background (radius = 90") regions and 
generated response and ancillary response files. 
We fitted the pn, MOSl, and MOS2 spectra 
from observation 2 simultaneously with simple 



Fig. 13.— EPIC MOS2 X-ray spectrum of the 
candidate SFXT 3XMM J184430.9—024434 fitted 
with an absorbed power law. 



Fig. 14.— EPIC pn light curve (bin size = 
150 s) of the candidate slow pulsar 3XMM 
J181923.7—170616 from observation 2. 
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absorbed power law, black body, bremsstrahlung, 
and thermal plasma models. An additional con¬ 
stant multiplicative component (frozen at 1 for 
the pn spectrum) was included to account for dif¬ 
ferences in the instrument responses. The best fit 
was obtained with the power law model (y^/dof = 
481.9/377), though the fit is not statistically ac¬ 
ceptable and the residuals indicate that the model 
did not adequately represent the data above ^6 
keV. 


The addition of a high energy exponential cut¬ 
off improved the fit (y^/dof = 454.5/375), while 
adding an additional gaussian line at ^6.7 keV 
(representing Helium-like iron emission) improved 
it further (y^/dof = 431.5/372). To test whether 
the addition of these model components is statis¬ 
tically justifiable we calculated the Bayesian infor¬ 
mation criterion (BIC) for each model. We found 
that the BIC was lowest for the simple absorbed 
power law model, indicating that adding the high 
energy cut-off and the Gaussian emission line are 
not statistically justified. The poor fit residuals 
could, however, be indicative of spectral variability 
within the observation. Further analysis is needed 
to better constrain the nature of the X-ray emis¬ 
sion. Figure shows the spectra of observation 2 
fitted with the simple absorbed power law model, 
while Table presents the parameters of this fit. 
The spectra from observations 1 and 3 were also 
well described by an absorbed power law. 


The ^400 s period could represent the spin of a 
compact object, either a white dwarf (i.e. a CV) or 
neutron star (i.e. an XRB). However, the spectra 
(in particular the moderately high absorption) are 
more consistent with a HMXB, a number of which 
are known to contain slowly spinning neutron stars 


with periods of hundreds of seconds (e.g. Ikhsanov 

eFaTIIMIt . 


5.3. 3XMM J181355.6-324237: An Eclips- 
ing Binary? 

3XMM J181355.6—324237 was observed three 
times by XMM-Newton on the 15th, 17th, and 
19th of September 2009 (Obsids: 0604860201, 
0604860301, and 0604860401). In all three obser¬ 
vations the MOS cameras were in large window 
mode meaning that the source fell off the chip in 
all observations. There are therefore only 3 pn de¬ 
tections in 3XMM. The highest outlier measure of 
523 is from the first observation, and the classifi- 


o 
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Fig. 15.— Lomb-Scargle power spectrum of the 
pn light curve (binned at the frame time of 
73.4 ms) of the candidate slow pulsar 3XMM 
J181923.7—170616 from observation 2. The 99% 
significance level is indicated by the dashed line. 



Fig. 16.— EPIC pn light curve of the candidate 
slow pulsar 3XMM J181923.7—170616 from obser¬ 
vation 1 folded over a period of 400 s. 
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-0. The 3XMM 0.2-12 keV flux 


t -2 0-1 


s ^ in the first two ob- 
erg cm“^ 


cation margin is 
is ^ 4 X 10“^^ erg cm 
servations, but drops to ^ 3 x 
s“^ in the third. It is located towards the Galactic 
centre but ^7° degrees below the plane, and is co¬ 
incident with the Cepheid variable star V2719 Sgr. 
It has no match in NOMAD within the 3a posi¬ 
tional errors, but there are two NOMAD sources 
^1.5" away. Inspection of the XMM-Newton op¬ 
tical monitor images found a single source coinci¬ 
dent with the X-ray position, indicating that there 
are likely some issues with the NOMAD astrome¬ 
try and/or that the star has a high proper motion. 
The NOMAD source 0572-1005906 has B - 12.6 
mag, V ^ 16.3 mag, R ^ 17.3 mag, J ^ 15.5 mag, 
H ^ 14.9 mag, and K ^ 14.9 mag. 

The 3XMM light curves show clear evidence for 
periodic dips over a period of ^20 ks in the first 
two observations (see Figure 18) where the count 
rate drops to zero for ^1 ks. No evidence of dips is 
seen in the third observation, though this is likely 
due to the poorer S/N resulting from the signifi¬ 
cant drop in flux. Our automatic timing analysis 
routine indicated the presence of significant pe¬ 
riodic variability with a period of 18,523 s with 
a false alarm probability of 10 To test for 
higher frequency variability, we extracted source 
and background light curves from all three pn ex¬ 
posures at the frame time of 73.4 ms (from cir¬ 
cular regions of radii 15" for observations 1 and 
2, and 12" for observation 3), correcting them as 
described above. A strong peak was detected at a 
period of ^18 ks in the power spectra of the ob¬ 
servation 1 (Figure and 2 light curves, but no 
evidence was found for periodic variability at any 
other frequency. No periodic variability was de¬ 
tected at any frequency in the observation 3 light 
curve. The profile of the observation 1 light curve 
folded over a period of 18 ks is shown in Figure 
A clear dip is seen out of phase with the light 
curve maximum. Such dips are typical of eclipsing 
binary systems such as CVs and XRBs. 


We extracted spectra using the same source re¬ 
gions as for the light curves but with background 
regions of radii 45" for observations 1 and 2 and 
36" for observation 3. We fitted the spectrum 
from observation 1 with the same simple models 
as before, obtaining the best fits with absorbed 
black body (y^/dof = 189.3/127) and power law 
(x^/dof = 193.5/127) models. The residuals indi- 



Energy (keV) 


Fig. 17.— EPIC X-ray spectra (black = pn, red = 
MOSl, green = MOS2) from observation 2 of the 
candidate slow pulsar 3XMM J181923.7—170616 
fitted with an absorbed power law model. 



Fig. 18.— EPIC pn light curve (bin size = 
560 s) of the candidate eclipsing binary 3XMM 
J181355.6—324237 from observation 1. 


20 

















































Gated problems at high energies for the black body 
fit and at low energies for the power law model. 
Fitting the spectrum with a combined power law 
plus low temperature black body model obtained a 
better fit (y^/dof = 162.4/125), while the addition 
of a Gaussian emission line at ^6.7 keV improved 
the fit even further (y^/dof = 145.4/121). We cal¬ 
culated the BIG for each model, finding that it was 
lowest for the simple absorbed black body model, 
indicating that the power law and the Gaussian 
emission line are not statistically required. As 
with our candidate slow pulsar, further analysis 
is needed to better constrain the nature of the X- 
ray emission. The spectrum of observation 1 fit¬ 
ted with the simple absorbed black body model 
is shown in Figure while Table presents the 
spectral parameters of this fit. The spectrum in 
observation 2 is very similar to that of observa¬ 
tion 1. However, in observation 3 the spectrum 
changed significantly in shape such that a simple 
absorbed black body model was no longer an ac¬ 
ceptable fit (x^/dof = 57.9/22) and an absorbed 
power law provides a much better approximation 
of the data (y^/dof = 25.1/22). 

The presence of periodic sharp dips in the light 
curves is consistent with an eclipsing binary sys¬ 
tem, implying an orbital period of ^18 ks and an 
orbital radius of ^10^ km (assuming a combined 
mass of - 10 Mq for both stars). However, 
a Cepheid would not fit inside such a tight or¬ 
bit as Galactic Cepheids ha ve been found to hav e 
radii ^10^ - 10^ km (e.g. Gieren et al. 1998). 
It is possible that the Gepheid classification for 
V2719 Sgr was incorrect, or that it is aligned by 
chance with 3XMM J181355.6—324237. Alterna¬ 
tively, if V2719 Sgr truly is a Cepheid and associ¬ 
ated with this 3XMM source, it could be a hierar¬ 
chical triple system with a compact binary orbited 
by a Cepheid in a wider orbit. The X-ray spectra 
are most consistent with an XRB, however we can¬ 
not rule out a CV on this data alone. Regardless, 
3XMM J181355.6—324237 appears likely to be a 
new eclipsing binary system. 


Gieren et al. 


1998 



Period (s) 

Fig. 19.— Lomb-Scargle power spectrum of the 
EPIC pn light curve (binned at the frame time of 
73.4 ms) of the candidate eclipsing binary 3XMM 
J181355.6—324237 from observation 1. The 99% 
significance level is indicated by the dashed line. 



6. Summary Conclusions 

We have applied the RF machine learning algo¬ 
rithm to automatically classify 2,876 variable X- 
ray sources in the 3XMM catalog. We obtained 
a classification accuracy of ~ 92% when classify- 


Fig. 20.— EPIC pn light curve of the candidate 
eclipsing binary 3XMM J181355.6—324237 from 
observation 1 folded over a period of 18 ks. 
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ing by source class. We also tested the applica¬ 
tion of RF for quality control classification (i.e. 
identifying spurious sources), obtaining an accu¬ 
racy of ^ 95%. Manual investigation of classified 
unknown sources found that 96% of a sample of 
200 randomly selected sources were correctly clas¬ 
sified as either spurious or non-spurious. We also 
manually tested the accuracy of the classification 
by source class for a random sample of sources 
with P(Spur) ^ 30%, finding that 92% of a sam¬ 
ple of 101 sources with previous identifications in 
the literature were correctly classified. As with 
our previous work ( Lo et al!]|2014 ), we found that 
RF had trouble classifying sources belonging to 
classes that were not adequately sampled in the 
training set (e.g. GRBs). Regardless, selecting 
sources with P(Max) ^ 60% and P(Spur) ^ 30% 
should provide a clean sample of real sources with 
predominantly correct classifications. 


We also investigated a sample of 144 anoma¬ 
lous good sources with outlier measures ^ 400 
or classifier margins ^ — 0.3. We identified a 

number of truly rare sources that were not repre¬ 
sented in the training set (e.g. isolated neutron 
stars, magnetars, unusual AGN etc.), validating 
the effectiveness of the RF classifier for identify¬ 
ing members of rare outlier source populations. 
We also identified a number of sources with un¬ 
reliable X-ray parameters and/or astrometry that 
were detected in problematic mosaic mode obser¬ 
vations. In addition, we identified three previously 
unstudied sources that appear to be truly unique 
objects, including a new candidate SFXT, a new 
candidate 400 s slow pulsar, and an eclipsing com¬ 
pact binary system with a 5 hr orbital period that 
may be the inner binary of a hierarchical triple 
system containing a Cepheid variable. Additional 
work beyond the scope of this paper is underway 
to further investigate these unique objects. 
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Fig. 21.— EPIC pn spectrum of the candidate 
eclipsing binary 3XMM J181355.6—324237 from 
observation 1 fitted with an absorbed black body 
model. 
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A. Appendix 

Here we provide light curves, Lomb-Scargle power spectra, and spectra for the additional XMM-Newton 
observations for the three outliers sources discussed in §5. We also provide tables giving the best-ht spectral 
models for ah EPIC spectra for these sources. 

A.l. The Candidate SFXT 3XMM J184430.9-024434 


Table 6: Parameters of the best-ht models htted to the EPIC spectra of the candidate SEXT 3XMM 
J184430.9-024434. 


Parameter 

Observation 1 

Observation 2 

Units 

nH 


' —3 

10^^ atom cm“^ 

F 

Elux^ 

7±1 

1 O+0-9 
-'-•^-0.7 

6±1 

10“^^ erg cm“^ s“^ 

X^/dof 

17.4/19 

5.6/11 



^ Absorbed hux in the 0.2-10 keV band. 
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Fig. 22.— Top; EPIC MOS2 light curve (bin size = 680 s) of the candidate SFXT 3XMM J184430.9-024434 
from observation 2. Middle: Lomb-Scargle power spectrum of the MOS2 light curve (binned at the frame 
time of 2.6 s) of the candidate SFXT 3XMM J184430.9—024434 from observation 2. The 99% significance 
level is indicated by the dashed line. Bottom: EPIC MOS2 X-ray spectrum of the candidate SFXT 3XMM 
J184430.9—024434 from observation 2 fitted with an absorbed power law. 
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A.2. 


The Candidate Slow Pulsar 3XMM J181923.7—170616 


Table 7: Parameters of the best-fit models fitted to the EPIC spectra of the candidate slow pulsar 3XMM 
J181923.7-170616. 


Parameter 

Observation 1 

Observation 2 

Observation 3 

Units 

nH 

P 

Flux^ 

X^/dof 

-I r\-\-0.4: 

0.4T0.1 

o 0+0.2 

110.0/94 

1.5T0.2 

0.56T0.07 

q qr+0.10 

481.9/377 

1.2±0.2 

0.4±0.1 

4 2+^-^ 

157.2/127 

10^^ atom cm“^ 

10“^^ erg cm“^ s“^ 


^Absorbed flux in the 0.2-10 keV band. 
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Fig. 23.— Top: EPIC MOSl light curve (bin size = 110 s) of the candidate slow pulsar 3XMM 
J181923.7—170616 from observation 1. Middle: Lomb-Scargle power spectrum of the MOSl light curve 
(binned at the frame time of 2.6 s) of the candidate slow pulsar 3XMM J181923.7—170616 from observation 
1. The 99% signihcance level is indicated by the dashed line. Bottom: EPIC X-ray spectra (black = pn, red 
= MOSl, green = MOS2) of the candidate slow pulsar 3XMM J181923.7—170616 from observation 1 htted 
with an absorbed power law model. 
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Fig. 24.— Top: EPIC MOSl light curve (bin size = 110 s) of the candidate slow pulsar 3XMM 
J181923.7—170616 from observation 3. Middle: Lomb-Scargle power spectrum of the MOSl light curve 
(binned at the frame time of 2.6 s) of the candidate slow pulsar 3XMM J181923.7—170616 from observation 
3. The 99% signihcance level is indicated by the dashed line. Bottom: EPIC X-ray spectra (black = MOSl, 
red = MOS2) of the candidate slow pulsar 3XMM J181923.7—170616 from observation 1 htted with an 
absorbed power law model. 
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A.3. 


The Candidate Eclipsing Binary 3XMM J181355.6—324237 


Table 8: Parameters of the best-fit models fitted to the EPIC spectra of the candidate eclipsing binary 3XMM 
J181355.6 -324237. 

Parameter Observation 1 Observation 2 Observation 3 Units 


nH 

0 09+^^™ 

0 002+^-^^^ 
O.UUZ_Q QQ2 

'-'•-'-^-0.09 

10^^ atom cm ^ 

kT 

1.34±0.06 

-1 0^7+0.05 
' -0.06 



P 


1 Q+0-4 
-*-•^-0.3 


Flux^ 

2.8±0.2 

2.5T0.1 

00+0.05 

'-'•^^-0.04 

10“^^ erg cm“^ s“^ 

X^/dof 

181.9/126 

200.7/135 

25.1/22 



^Absorbed fiux in the 0.2-10 keV band. 
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Fig. 25.— Top: EPIC pn light curve of the candidate eclipsing binary 3XMM J181355.6—324237 from 
observation 2. Middle: Lomb-Scargle power spectrum of the EPIC pn light curve (binned at 100 s) of 
the candidate eclipsing binary 3XMM J181355.6—324237 from observation 2. The 99% signihcance level 
is indicated by the dashed line. Bottom: EPIC pn spectrum of the candidate eclipsing binary 3XMM 
J181355.6—324237 from observation 2 htted with an absorbed black body model. 
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Fig. 26.— Top: EPIC pn light curve of the candidate eclipsing binary 3XMM J181355.6—324237 from 
observation 3. Middle: Lomb-Scargle power spectrum of the EPIC pn light curve (binned at 100 s) of 
the candidate eclipsing binary 3XMM J181355.6—324237 from observation 3. The 99% signihcance level is 
greater than 5 x 10“^ and is thus not shown. Bottom: EPIC pn spectrum of the candidate eclipsing binary 
3XMM J181355.6—324237 from observation 3 htted with an absorbed power law. 
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